NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering

https://doi.org/10.1109/BIBM52615.2021.9669300

Yue, Xiang; Zhang, Xinliang; Yao, Ziyu; Lin, Simon; Sun, Huan (December 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))

Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts. Studies show that neural QA models trained on one corpus may not generalize well to new clinical texts from a different institute or a different patient group, where large-scale QA pairs are not readily available for model retraining. To address this challenge, we propose a simple yet effective framework, CliniQG4QA, which leverages question generation (QG) to synthesize QA pairs on new clinical contexts and boosts QA models without requiring manual annotations. In order to generate diverse types of questions that are essential for training QA models, we further introduce a seq2seq-based question phrase prediction (QPP) module that can be used together with most existing QG models to diversify the generation. Our comprehensive experiment results show that the QA corpus generated by our framework can improve QA models on the new contexts (up to 8% absolute gain in terms of Exact Match), and that the QPP module plays a crucial role in achieving the gain.
more » « less
Full Text Available
COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval

https://doi.org/10.18653/v1/2021.emnlp-main.305

Zhang, Xinliang; Sun, Heming; Yue, Xiang; Lin, Simon; Sun, Huan (November 2021, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing)

We present a large, challenging dataset, COUGH, for COVID-19 FAQ retrieval. Similar to a standard FAQ dataset, COUGH consists of three parts: FAQ Bank, Query Bank and Relevance Set. The FAQ Bank contains ~16K FAQ items scraped from 55 credible websites (e.g., CDC and WHO). For evaluation, we introduce Query Bank and Relevance Set, where the former contains 1,236 human-paraphrased queries while the latter contains ~32 human-annotated FAQ items for each query. We analyze COUGH by testing different FAQ retrieval models built on top of BM25 and BERT, among which the best model achieves 48.8 under P@5, indicating a great challenge presented by COUGH and encouraging future research for further improvement. Our COUGH dataset is available at https://github.com/sunlab-osu/covid-faq.
more » « less
Full Text Available
Clinical Phrase Mining with Language Models

https://doi.org/10.1109/BIBM49941.2020.9313496

Mani, Kaushik; Yue, Xiang; Gutierrez, Bernal Jimenez; Huang, Yungui; Lin, Simon; Sun, Huan (December 2020, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))
null (Ed.)
Full Text Available
Rationalizing Medical Relation Prediction from Corpus-level Statistics

Wang, Zhen; Lee, Jennifer; Lin, Simon; Sun, Huan (July 2020, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL'20))

Full Text Available
Reinforcement of Natural Rubber Latex Using Jute Carboxycellulose Nanofibers Extracted Using Nitro-Oxidation Method

https://doi.org/10.3390/nano10040706

Sharma, Sunil K.; Sharma, Priyanka R.; Lin, Simon; Chen, Hui; Johnson, Ken; Wang, Ruifu; Borges, William; Zhan, Chengbo; Hsiao, Benjamin S. (April 2020, Nanomaterials)

Synthetic rubber produced from nonrenewable fossil fuel requires high energy costs and is dependent on the presumed unstable petroleum price. Natural rubber latex (NRL) is one of the major alternative sustainable rubber sources since it is derived from the plant ‘Hevea brasiliensis’. Our study focuses on integrating sustainably processed carboxycellulose nanofibers from untreated jute biomass into NRL to enhance the mechanical strength of the material for various applications. The carboxycellulose nanofibers (NOCNF) having carboxyl content of 0.94 mmol/g was prepared and integrated into its nonionic form (–COONa) for its higher dispersion in water to increase the interfacial interaction between NRL and NOCNF. Transmission electron microscopy (TEM) and atomic force microscopy (AFM) analyses of NOCNF showed the average dimensions of nanofibers were length (L) = 524 ± 203 nm, diameter (D) 7 ± 2 nm and thickness 2.9 nm. Furthermore, fourier transform infra-red spectrometry (FTIR) analysis of NOCNF depicted the presence of carboxyl group. However, the dynamic light scattering (DLS) measurement of NRL demonstrated an effective diameter in the range of 643 nm with polydispersity of 0.005. Tensile mechanical strengths were tested to observe the enhancement effects at various concentrations of NOCNF in the NRL. Mechanical properties of NRL/NOCNF films were determined by tensile testing, where the results showed an increasing trend of enhancement. With the increasing NOCNF concentration, the film modulus was found to increase quite substantially, but the elongation-to-break ratio decreased drastically. The presence of NOCNF changed the NRL film from elastic to brittle. However, at the NOCNF overlap concentration (0.2 wt. %), the film modulus seemed to be the highest.
more » « less
Full Text Available
SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

https://doi.org/10.1145/3292500.3330894

Wang, Zhen; Yue, Xiang; Moosavinasab, Soheil; Huang, Yungui; Lin, Simon; Sun, Huan (January 2019, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining)

Full Text Available
Assessment of 135 794 Pediatric Patients Tested for Severe Acute Respiratory Syndrome Coronavirus 2 Across the United States

https://doi.org/10.1001/jamapediatrics.2020.5052

Bailey, L. Charles; Razzaghi, Hanieh; Burrows, Evanette K.; Bunnell, H. Timothy; Camacho, Peter E.; Christakis, Dimitri A.; Eckrich, Daniel; Kitzmiller, Melody; Lin, Simon M.; Magnusen, Brianna C.; et al (February 2021, JAMA Pediatrics)
null (Ed.)
Full Text Available
Graph embedding on biomedical networks: methods, applications and evaluations

https://doi.org/10.1093/bioinformatics/btz718

Yue, Xiang; Wang, Zhen; Huang, Jingong; Parthasarathy, Srinivasan; Moosavinasab, Soheil; Huang, Yungui; Lin, Simon M; Zhang, Wen; Zhang, Ping; Sun, Huan; et al (October 2019, Bioinformatics)

Abstract Motivation Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. Results We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug–drug interaction (DDI) prediction, protein–protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks. Availability and implementation As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at: https://github.com/xiangyue9607/BioNEV, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available

Search for: All records